(19) 




Europaisches Patentamt 
European Patent Office 
Off tee europeen des brevets 





(12) 



(43) Date of publication: 

15.05.2002 Bulletin 2002/20 



(11) EP1 206104 A1 

EUROPEAN PATENT APPLICATION 

(51) lntCl7: H04M 3/32 



(21) Application number; 00203936.0 

(22) Date of filing: 09.11,2000 



(84) Designated Contracting States: 

AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 
MO NLPTSETR 
Designated Extension States: 
AL LT LV MK RO SI 

(71) Applicant: Koninklijke KPN N.V. 
9726 AE Groningen (NL) 

(72) Inventors: 

• Appel, Symon Ronald 
2627 AL Delft (NL) 



• Beerends, John Gerard 
4585 PB Hengstdijk (NL) 

• Hekstra, Andries Pieter 
2252 KM Voorschoten (NL) 

(74) Representative: Kruk, Wlggert Johan et al 
Konfnklljke KPN N.V., 
Intellectual Property Dep., 
P.O.Box 95321 
2509 CH Den Haag (NL) 



< 



(54) Measuring a talking quality of a telephone link in a telecommunications network 



(57) For measuring the influence of noise on the 
talking quality of a telephone link in a telecommunica- 
tions network, a talker speech signal (s(t)) and a degrad- 
ed speech signal (s'(t)) are fed to an objective measure- 
ment device (32) for obtaining an output signal (q) rep- 
resenting an estimated value of the talking quality. The 
degraded signal includes a returned signal (r(t)) origi- 
nating from the network during transmission of the talker 
speech signal over the telephone link. The objective 
measurement earned out by the device is a modified 
PSQM-like measurement, which is modified as to in- 
clude a modelling (32b) of masking effects in conse- 
quence of noise present in the returned signal. Prefer- 
ably the modelling includes a noise suppression (42) 
carried out to a difference signai (D(t,f)) in the loudness 
density domain using a noise estimation (41). 
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s'(t) = s(l)©r(t) 
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Description 

A. BACKGROUND OF THE INVENTION 

[0001] The invention lies in the area of nneasuring the 
quality of telephone links in telecommunications sys- 
tems. More in particular, it concerns measuring a talking 
quality of a telephone link In a telecommunication net- 
work, i.e. measuring the influence of returned signals 
such as echo disturbances and side tone distortions on 
the perceptual quality of a telephone link in a telecom- 
munications system as subjectively observed by a talker 
during a telephone call. 

[0002] Such a method and a corresponding device 
are described in the not timely published international 
patent application PCT/EPOO/08884 (Reference [1]; for 
more bibliographical details relating to the references, 
see below under D.), which is Incorporated by reference 
in the present application. According to the described 
method and device for measuring the influence of echo 
on the perceptual quality on the talker's side of a tele- 
phone link in a telecommunications network, a talker 
speech signal and a combined signal are fed to an ob- 
jective measurement device, such as a PSQM system, 
for obtaining an output signal representing an estimated 
value of the perceptual talking quality. The combined 
signal Is a signal combination of a retumed signal orig- 
inating from the network and corresponding to the talker 
speech signal, and the talker speech signal itself. The 
described technique has the following problem. In case 
the returned signal contains signal components not di- 
rectly related to the voice of the talker, like noise present 
In the telephone system, noise derived from the back- 
ground noise of the talker at the other side of the tele- 
phone connection, or noise derived from Interfering sig- 
nals, such signal components may have a so-called 
masking effect on the echo, which then results in an in- 
crease of the subjectively perceived talking quality. Ob- 
jective measurement systems such as based on the 
Perceptual Speech Quality Measurement (PSQM) mod- 
el, recommended by the ITU-T Recommendation P.861 
(see Reference [2]), however, will interpret noise com- 
ponents generally in terms of a decrease in quality. This 
problem may be tried to be solved by using noise sup- 
pression techniques as generally known in the world of 
speech processing (see e.g. References [3],-,[6]). How- 
ever, these known suppression techniques are devel- 
oped for optimising listening quality, and are not suited 
for the measurement and optimisation of talking quality. 
Talking quality differs from listening quality, especiafty in 
the effect of masking noise and masking by one's own 
voice. Noise in general decreases listening quality but 
increases talking quality. 

B. SUMMARY OF THE INVENTION 

[0003] The main object of the present invention Is to 
provide for an improved objective measurement method 



and corresponding device for measuring a talking qual- 
ity of a telephone link in a telecommunication network, 
I.e. for measuring the influence of returned signals such 
as echo, side tone distortion, inclusive the influence of 

5 noise, on the perceptual quality on the talker's side of 
the telephone link, which do not possess said problem. 
[0004] A method for measuring a talking quality of a 
telephone link in a telecommunications network accord- 
ing to the preamble of claim 1 , as described in Refer- 

10 ence [1 ], is, according to the invention, characterised as 
in claim 1 . 

[0005] A device for measuring a talking quality of a 
telephone link in a telecommunications network accord- 
ing to the preamble of claim 10, as described in refer- 
15 ence [1 ], Is, according to the invention, characterised as 
in claim 10. 

[0006] The invention is based on the appreciation that 
objective measurement systems such as PSQM, and 
which are covered by the above-mentioned Recom- 
mendation P.861 , have been developed for measuring 
the listening quality of speech signals. Therefore, in or- 
der to provide a similar objective measurement for 
measuring the talking quality of a telephone link, the 
step of modelling echo masking effects is introduced In 
the objective measurement method and device. 
[0007] According to the Recommendation P.861 at 
first a speech signal, which Is an output signal of an au- 
dio- or speech processing or transporting system, and 
of which the signal quality has to be assessed, and a 
reference signal are mapped to representation signals 
of a psycho-physical perception model of the human au- 
ditory system. These representation signals are in fact 
the compressed loudness density functions of the 
speech and reference signals. Then two operations, 
which imply an asymmetry processing and a silent In- 
terval weighting in order to model two cognitive effects, 
are carried out on a difference signal of the two repre- 
sentation signals In order to produce the quality signal 
which Is a measure for the auditory perception of the 
speech signal to be assessed. However, it is known that 
noise in the echo signal, especially background noise 
originating at the side of the B subscriber of the tele- 
phone link, can have a masking effect on the echo sig- 
nal, thus leading to an improvement of the subjectively 
perceived talking quality. Then It was realised that In the 
operations carried out on the difference In the algorithm 
of the Recommendation P.B61 noise In the echo signal 
will be interpreted as an Introduced distortion, leading 
to a deterioration of the objectively measured talking 
50 quality, and therefore these operations should be mod- 
ified and/or supplemented by a step of modelling echo 
masking effects of noise. 

[0008] Therefore a preferred embodiment of the 
method and of the device of the present invention are 
55 characterised according to claim 2 and claim 1 1 , respec- 
tively. 

[0009] Further preferred embodiments of the method 
and the device of the invention are summarised In the 
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various subclaims. 
C. REFERENCES 
[0010] 

[1] PCT/E POO/08884 (of applicant; filing date: 
08.09.2000); 

[2] ITU-T recommendation P.861: Objective quality 
measurement of telephone band (330-3400 Hz) 
speech codecs, August 1996; 
[3] R. Le Bouquin, " Enhancement of Noisy Speech 
Signals: Applications to Mobile Radio Communica- 
tions", Speech Communication, vol. 18, pp. 3-19 
(1996); 

[4] J.-HChen and A. Gersho, "Adaptive Postfiltering 
for Quality Enhancement of Coded Speech", IEEE 
Trans, on Speech and Audio Processing., vol. 3, pp. 
59-71 (1996 Jan); 

[5] D. E. Tsoukalas, J. Mourjopoulos and G. Kokkl- 
nakis, " Perceptual Filters for Audio Signal En- 
hancement", J. Audio Eng. Soc, vol. 45, pp. 22-36 
(1997 Jan/Feb); 

[6] F. Xie and D. van Compernolle, "Speech En- 
hancement by Spectral Magnitude Estimation - A 
unifying Approach", Speech Communication, vol. 
19, pp. 89-104 (1996). 

[0011] All references are considered to be Incorporat- 
ed into the present application. 

D. BRIEF DESCRIPTION OF THE DRAWING 

[0012] The invention will be further explained by 
means of the description of exemplary embodiments, 
reference being made to a drawing comprising the fol- 
lowing figures: 

FIG. 1 schematically shows an example of a usual 
telephone link In a telecommunications net- 
work; 

FIG. 2 schematically shows an earlier described set- 
up for measuring a talking quality of a tele- 
phone link using a known objective measure- 
ment technique for measuring a perceptual 
quality of speech signals; 

FIG. 3 schematically shows a device for an objective 
measurement of a talking quality of a tele- 
phone link according to the invention to be 
used in the set-up of FIG. 2; 

FIG. 4 shows a flow diagram of the detailed opera- 
tion of a part of the device shown in FIG. 3; 

FIG. 5 schematically shows a modification in a fur- 
ther part of the device shown in FIG. 3. 

E. DESCRIPTION OF EXEMPLARY EMBODIMENTS 

[0013] Delay and echo play an increasing role in the 



quality of telephony services because modern wireless 
and/or packet based network techniques, like GSM, 
UMTS, DECT I P and ATM inherently introduce more de- 
lay than the classical circuit switching network tech- 
5 niques like SDH and PDH. Delay and echo together with 
the side tone detemiine how a talker perceives his own 
voice in a telephone link. The quality with which he per- 
ceives his own voice is defined as the talking quality. It 
should be distinguished from the listening quality which 
^0 deals with how a listener perceives other voices (and 
music). Talking and listening quality together with the 
interaction quality determine the conversational quality 
of a telephone link. Interaction quality Is defined as the 
ease of Interacting with the other party in a telephone 
5 call, dominated by the delay in the system and the way 
it copes with double talk situations. The present inven- 
tion is related to the objective measurement of talking 
quality of a telephone link, and more particular to ac- 
count for the influence of noise therein. 
0 [0014] FIG. 1 schematically shows an example of a 
usual telephone link established between an A subscrib- 
er and a B subscriber of a telecommunications network 
10. Telephone sets 11 and 12 of the A subscriber and 
the B subscriber, respectively, are connected by way of 
5 two-wire connections 13 and 14 and four-wire interfac- 
es, namely, hybrids 15 and 16, to the network 10. 
Through the network, the established telephone link has 
a fonvard channel Including atwo-wire part, i.e. two-wire'- 
connections 13 and 14, and a four-wire send part 17, 
> over which speech signals from the A subscriber are 
conducted, and a return channel including a two-wire 
part, i.e. two-wire connections 14 and 13, and a four- 
wire receive part 1 8, over which speech signals from the 
B subscriber are conducted. A speech signal s striking 
the microphone M of the telephone set 11 of the A sub- 
scriber, is passed on, by way of the forward channel (13, 
1 7, 14) of the telephone link, to the earphone R of tele- 
phone set 1 2. and becomes audible there for the B sub- 
scriber as a speech signal s" affected by the network. 
Each speech signal s(t) on the fonvard channel gener- 
ally causes a returned signal r(t) which, particulariy due 
to the presence of said hybrids, Includes an electrical 
type of echo signal on the return channel (18. 13) of the 
telephone link, and this is passed on to the earphone R 
of the telephone set 1 1 , and may therefore disturb the 
Asubscriberthere. Furthennore the acoustic and/or me- 
chanical coupling of the earphone or loudspeaker signal 
to the microphone of the telephone set of the B subscrib- 
er may cause an acoustic type of echo signal back to 
the telephone set of the A subscriber, which contributes 
to the returned signal. In an end-to-end digital telephone 
link (such as in a GSM system or in a Voice-over-IP sys- 
tem) such acoustic echo signal is the only type of echo 
signal that contributes to the return signal. 
Summarizing a returned signal r(t) may include, at var- 
ious stages in the return channel of a telephone link as 
caused by a speech signal s(t) in the fonvard channel 
of the telephone link: 
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a signal r1 representing acoustic echo; 
a signal r2 representing an electrical echo possibly 
in combination with the acoustic echo; 
a signal r3 which represents the signal r2 as affect- 
ed, i.e. delayed or distorted, by the network 10; 
a signal r4 which represents the signal r3 In conibi- 
nation with a side tone signal, and 
a signal r5 which is an acoustic signal derived from 
the signal r4. that also includes the locally generat- 
ed side tone. 

[0015] FIG. 2 shows schematically a set-up for hrieas- 
uring a talking quality of a telephone link using a known 
objective measurement technique for measuring a per- 
ceptual quality of speech signals, as described in refer- 
ence [1 ]. The set-up comprises a system or telecommu- 
nications network under test 20, hereinafter for brief- 
ness' sake referred to as network 20, and a system 22 
for the perceptual analysis of speech signals offered, 
hereinafter for briefness' sake only designated as 
PSQM system 22. Any talker speech signal s(t) is used, 
on the one hand, as an input signal of the network 20 
and, on the other hand, as first input (or reference) signal 
of the PSQM system 22. A returned signal r(t) obtained 
from the network 20, which corresponds to the Input talk- 
er speech signal s(t), Is combined, in a combination cir- 
cuit 24, with the talker speech signal s(t) to provide a 
combined speech signal s'(t), which is then used as a 
second input (or degraded) signal of the PSQM system. 
If necessary, the signal s(t) is scaled to the con^ect level 
before being combined with the returned signal r(t) In 
the combination circuit. An output signal q of the PSQM 
system 22 represents an estimate of the talking quality, 
i.e. of the perceptual quality of the telephone link 
through the network 20 as it is experienced by the tele- 
phone user during talking on his own telephone set. 
Here use may be made of signals stored on data bases. 
These signals may be obtained or have been obtained 
by simulation or from a telephone set (e.g. signal r4 in 
the electrical domain or signal r5 In the acoustic domain) 
of the A subscriber in the event of an established link 
during speech silence of the B subscriber. The two-wire 
connection between the telephone subscriber access 
point and the four-wire interface with the network does 
not, or hardly, contribute to the echo component in the 
returned signal r(t) (of course, it does contribute to the 
echo component in a returned signal occurring in the 
return channel of the B subscriber of the telephone link). 
However, any such signal contribution has a short delay 
and, as a matter of fact, forms part of the side tone, 
[0016] The signals s(t) and r(t) may also be tapped off 
from a four-wire part 17 of the fonward channel and the 
four-wire part 1 8 of the return channel near the four-wire 
Interface 15, respectively. This offers, as already de- 
scribed In reference [1], the opportunity of a pemianent 
measurement of the talking quality In the event of es- 
tablished telephone links, using live traffic non-lntaisive- 

ly. 



[0017] The system or network being tested may of 
course also be a simulation system, which simulates a 
telecommunications networi<. 

[0018] The described technique has, however, the fol- 

5 lowing problem. Since a system or network under test 
generally will not be ideal, any returned signal r(t) will 
contain also signal components not directly related to 
the voice of the talker, like noise present in the telephone 
system, noise derived from the background noise of the 

10 listener at the other side of the telephone connection, or 
noise derived from interfering signals. In such a case 
these signal components may have a so-called masking 
effect on the echo, which then results in an increase of 
the talking quality. Objective measurement systems like 

15 PSQM, however, which up to now have been developed 
for assessing the listening quality of speech signals, wilt 
interpret such noise components in terms of a decrease 
in quality. In the following a method and a device are 
described which in essence imply a modification of a 

20 PSQM-like algorithm as recommended by the ITU-T 
Recommendation P861 , in order to avoid the problem 
and to make the existing algorithm suitable for objec- 
tively measuring the talking quality with a higher corre- 
lation with a subjectively measured talking quality, when 

25 used in a set-up as shown in FIG. 2, than without the 
modification. 

[0019] FIG. 3 shows schematically a measuring de- 
vk;e for objectively measuring the perceptual quality of 
an audible signal. The device comprises a signal proc- 

30 essor 31 and a combining arrangement 32. The signal 
processor is provided with signal Inputs 33 and 34, and 
with signal outputs 35 and 36 coupled to corresponding 
signal inputs of the combining an^angement 36. A signal 
output 37 of the combining arrangement 36 is at the 

35 same time the signal output of the measuring device. 
The signal processor includes perception modelling 
means 38 and 39, respectively coupled to the signal in- 
puts 33 and 34, for processing input signals s(t) and s' 
(t) and generating representation signals R(t,f) and R(t, 

40 f) which fomn time/frequency representations of the In- 
put signals s(t) ands'(t), respectively, according to a per- 
ception model of the human auditory system. The rep- 
resentation signals are functions of time and frequency 
(Hz scale or Bark scale). The signal processing, as usu- 

45 al, is carried out frame-wise. I.e. the speech signals are 
split up in frames that are about equal to the window of 
the human ear (between 10 and 100ms) and the loud- 
ness per frame is calculated on the basis of the percep- 
tion model. Only for reasons of simplicity this frame-wise 

50 processing is not indicated in the figures. 

[0020] The representation signals R(t,f) and R'(t,f) are 
passed to the combining arrangement 32 via the signal 
outputs 35 and 36. In the combining arrangement of the 
known PSQM-IIke algorithm at first a difference signal 

55 of the representation signals is detemiined followed by 
various processing steps carried out on the difference 
signal. The last ones of the various processing steps im- 
ply integration steps over frequency and time resulting 
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in a quality signal q available at the signal output 37. 
[0021] In case of determining a listening quality the 
Input signal s'(t) is an output signal of an audio- or 
speech signals processing or transporting system, of 
which the signal processing or transporting operation is 5 
assessed, while the input signal s(t), being the corre- 
sponding input signal of the system to be assessed, is 
used as reference signal. For determining a talking qual- 
ity, however, where, as described with reference to FIG, 
2, the input signal s'(t) is a combination of the signal s io 
(t) and the returned signal r(t), the known combining ar- 
rangement should be modified. 
[0022J According to the recommended PSQM-like al- 
gorithm (see reference [2], more particularly FIGURE 
3/P.B61) the various processing steps carried out by is 
(within) the combining arrangement, include asymmetry 
processing and silent interval weighting steps for mod- 
elling some perceptual effects. It Is known that noise In 
the echo signal, especially background noise originating 
at the side of the B subscriber of the telephone link, has 20 
a masking effect on the echo signal, thus leading to an 
Improvement of the subjectively perceived talking qual- 
ity. Then it was realized that the presence of the steps 
for modelling the cognitive effects in the algorithm, how- 
ever, in which noise in the echo signal will be interpreted 2s 
as an introduced distortion, would lead to a deterioration 
of the objectively measured talking quality, and there- 
fore could not be maintained as such. 
[0023] Instead, for correctly measuring the talking 
quality, a step of modelling masking effects which noise 30 
present in the returned signal could have on perceived 
echo disturtaances. Is Introduced. Such a modelling step 
could be based on a possible separation of echo com- 
ponents and noise components present in the returned 
signal r(t). However a reliable modelling could be 35 
reached in a different, simpler manner This modelling 
step implies a specific noise suppression step can-led 
out on the difference signal by using an estimated value 
for the noise. Therefore the combining arrangement 32 
comprises: 40 

- in a first part 32a, a subtraction means 40 for per- 
ceptually subtracting the two representation signals 
R(t,f) and R'(t,0 received from the signal processor 
31 and generating a difference signal D(t,f), 45 
in a second part 32b, a noise estimating means 41 
for generating an estimated noise value Ne for the 
noise present In the input signal s'(t), and a noise 
suppression means 42 for deriving from the differ- 
ence signal D(t,f) and the estimated noise value Ne so 
a modified difference signal D'(t,f), and 
in a third part 32c, integration means 43 for integrat- 
ing the modified difference signal D'(t.f) successive- 
ly to frequency and time and generating the quality 
signal q. 55 

[0024] The estimated noise value Ne may be a pre- 
detennined value, e.g. derived from the type of tele- 



phone link, or Is preferably obtained from one of the rep- 
resentation signals, i.e. R'(t,f), which is visualised In FIG. 
3 by means of a broken dashed line between the signal 
output 36 with a signal input 44 of the noise estimation 
means 41. The representation signals R(t,f) and R'(t,f) 
are as usual loudness density functions of the reference 
and degraded speech signals s(t) and s'(t), respectively 
The output signal of the subtraction means 40, i.e. D(t, 
f). represents the signed difference between the loud- 
ness densities of the degraded (i.e. distorted by the 
presence of echo, side tone and noise signals in the re- 
turned signal) and the reference signal (I.e. the original 
talker speech signal), preferably reduced by a small per- 
ceptual correction, i.e. a small density correction for so- 
called internal noise. 

[0025] The resulting difference signal D(t,f). which Is 
In fact a loudness density function. Is subjected to a 
background masking noise estimation. The key Idea be- 
hind this is that, because talkers during a telephone call 
will always have silent intervals in their speech, during 
such Intervals (of course after the echo delay time) the 
minimum loudness of the degraded signal over time is 
almost completely caused by the background noise. 
Since the speech signal processing Is carried out in 
frames, this minimum may be put equal to a minimum 
loudness density Ne found in the frames of the repre- 
sentation signal R'(t.O. This minimum Ne can then be 
used to define a threshold value T(Ne) for setting the « 
content of all frames of the difference signal D(t,f), that ' 
have a loudness below this threshold, to zero, leaving 
the content of the other frames unchanged. The set-to- 
zero frames and the unchanged frames together consti- 
tute a signal from which the modified difference signal 
D'(t.f), the output signal of the noise suppression means 
42, is derived (see below). Consequently, the standard 
Hoth noise background masking noise, used in the main 
step of the PSQM-like algorithm of deriving the repre- 
sentation signals, has to be omitted from the algorithm. 
[0026] FIG. 4 shows schematically by means of a flow 
diagram more In detail the modelling step as carried out 
on the difference signal D(t,f) by the noise suppression 
means 42 using the estimated noise value Ne produced 
by the noise estimating means 41 . Again it is empha- 
sized that, although for sake of simplicity only not Indi- 
cated in the figures, the signal processing is understood 
to be frame-wise. The flow diagram Includes the follow- 
ing boxes: 

- box 46 indicating a step of Integrating the represen- 
tation signal R'(t,f), as produced by the signal proc- 
essor 31 via output 36, over frequency, resulting in 
a loudness degraded signal R'(t); 

- box 46 indicating a step of detemriining the estimat- 
ed noise value Ne for the noise present in the loud- 
ness degraded signal R'(t). Ne being equal to the 
minimum value of the loudness found in the loud- 
ness degraded signal R*(t) ; 

• boxes 47, 48 and 49 Indicating a step of subjecting 
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the difference signal D(t,f) to a criterion C by means 
of which from the difference signal a thresholded 
difference signal Dc(t,f) Is derived, box 48 indicating 
that D^(t,f) = D(t,f) for frames in which the loudness 
of the frames In the loudness degraded signal R'(t) 5 
suffices to the criterion and box 49 Indicating that 
Dj.(t,f) = 0 for frames in which the loudness of the 
frames in the loudness degraded signal R'(t) does 
not suffice to the criterion C; 

box 50 indicating a step of determining from the io 
thresholded difference signal Dc(t,f) the modified 
difference signal D'(t,0 by calculating a distortion 
loudness to signal loudness ratio (DSR) of the 
thresholded difference signal Dc(t,f) and the loud- 
ness degraded signal R'(t). I.e. D'(t.f) = DSR(t,0. 

[0027] Experimentally a suitable criterion C appeared 
to be that the loudness of the frames in the loudness 
degraded signal R'(t) is larger than or equal to the 
threshold value T(Ne) or not, choosing said threshold so 
value to be a constant factor Cf times the estimated val- 
ue Ne, i.e. T(Ne) = C,.Ne. A suitable value for the con- 
stant factor appeared to be C, = 1 .6. 
[0028] In calculating the DSR of the difference signal 
a clipping Is carried out by introducing a threshold on 
the signal loudness, below which the signal loudness is 
settothatthreshold. Inan optimisation a threshold value 
of 4 Sone was found. 

[0029] Finally the modified difference signal D' (t.f) is 
integrated by means of the integration means 43 at first 30 
over frequency using an Lp norm (I.e the generally 
known Lebesgue p-averaging function or Lebesgue p- 
norm) with p=0.8, and overtime using an Lp norm with 
p=6, resulting in the output value q for the talking quality. 
[0030] The quality output values of a thus modified ob- 35 
jective measurement method and device for assessing 
the talking quality, as experimentally obtained for seven 
databases of test speech signals, showed high correla- 
tions (above 0.93) with the mean opinion scores (MOS) 
of the subjectively perceived talking quality. 40 
[0031] For the measuring of the talking quality it Is 
necessary that the representation signal R'(t,f) Is a rep- 
resentation of the signal combination of the talker 
speech signal and the returned signal. To realise this, 
however, It Is not necessary that the degraded signal s' 
(t) is a signal combination of these two signals as indi- 
cated in FIG. 2 (signal comblnator 24) and in FIG. 3 (s' 
(t)=s(t)er(t)). It is also possible to use the returned signal 
(r(t)) as the degraded signal (s'(t)) and to obtain an in- 
termediate signal in an intemnediate stage of processing so 
the reference signal, as carried out by the perception 
modelling means 38, which then is combined with a cor- 
responding intemnedlate signal (Ps'(f)) obtained in a cor- 
responding Intermediate stage of processing the de- 
graded signal, as carried out by the perception model- ss 
ling means 39. Preferably the Intermediate signal is a 
Fast Fourier Transfonn power representation (Ps(f)) of 
the reference speech signal (s(t)). This modification Is 



shown schematically in FIG. 5 more in detail. The per- 
ceptual modelling means 38 and 39 cany out In a first 
stage of processing as usual (see reference [2]), respec- 
tively indicated by boxes 51 and 52, a step of detemnin- 
ing a Manning window (HW) followed by a step of deter- 
mining a Fast Fourier Transform (FFT) power represen- 
tation in order to produce the intermediate signals Ps(0 
and Pr(f), which are FTT power representations of the 
talker speech signal s(t) and the degraded signal s'(t) 
which now equals the returned signal r(t), respectively. 
In a second stage of processing, respectively indicated 
by boxes 53 and 54, a step of frequency warping (FW) 
to pitch scale Is carried out followed by steps of frequen- 
cy smearing (FS) and intensity warping (IW), In order to 
produce the representation signals R(t,f) and R'(t,f). Be- 
tween the first and second stages, as Indicated by the 
boxes 52 and 54, an Intennedlate signal addition of the 
intemriedlate signals Ps(f) and Pr(f), indicated by signal 
adder 55, Is carried out, the Intermediate signal sum In 
addition being the Input of the second processing stage 
(box 54). Before the intermediate signal addition can be 
applied, the intennedlate signal P(s(f)) has to be scaled 
to the correct level as usual. 

[0032] Consequently, when using such an Intermedi- 
ate signal addition (Ps(f)®Pr(f)) Inside the perception 
modelling means, instead of the external addition (s'(t) 
=s(t)er(t)), the combination circuit 24 becomes super- 
fluous. In case a device as described with reference to 
FIG. 3, having included the modification as described 
with reference to FIG. 5, Is used directly in a telephone 
link, in a way as already described in reference [1], then 
the Input ports 33 and 34 of the device may be directly 
coupled to the four-wire parts 1 7 and 1 8 of the fonvard 
and retum channel, respectively, of a telephone link. 



Claims 

1. Method for measuring a talking quality of a tele- 
phone link In a telecommunications network, 
the method comprising a main step of subjecting a 
degraded speech signal s'(t) with respect to a ref- 
erence speech signal s(t) to an objective measure- 
ment technique (32) for measuring a perceptual 
quality of speech signals, and producing a quality 
signal (q) which represents an estimated value con- 
cerning the talking quality, the reference speech 
signal being a talker speech signal (8(t)) and the de- 
graded speech signal including a returned signal r 
(t), the returned signal being a signal which oc- 
cun^ed or may occur in a return channel of the tele- 
phone link during the transmission of the talker 
speech signal In afonA/ard channel of the telephone 
link, 

characterised In that the main step is carried out 
by means of an objective measurement technique 
which includes a step of modelling masking effects 
In consequence of noise present in the returned sig- 
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nal. 

2. Method according to claim 1 , characterised in 
that 

5 

the main step comprises: 

a first processing step of processing the 
degraded speech signal (s^t)) and gener- 
ating a first representation signal (R'(t,f)), io 
a second processing step of processing 
the talker speech signal (s(t)) and generat- 
ing a second representation signal (R(t,f)), 
and 

a combining step of combining the first and '5 
second representation signals as to pro- 
duce said output signal (q), 

the first representation signal (R'(t,f)) being a 
representation signal of a signal combination of 20 
the talker speech signal and the returned sig- 
nal, and the combining step Including said step 
of modelling masking effects In consequence 
of noise present in the returned signal. 

25 

3. Method according to claim 2, characterised in 
that 

the combining step Includes: 

30 

a step of subtracting (32a) the first repre- 
sentation signal from the second represen- 
tation signal as to produce a difference sig- 
nal (D(t.f)), 

said step of modelling (32b) the masking 35 
effects of noise carried out on the differ- 
ence signal as to produce a modified differ- 
ence signal, and 

a step of Integrating (32c) the modified dif- 
ference signal with respect to frequency 40 
and time as to produce the quality signal, 

the modelling step including: 

aflrstsubstep of producing (41 ) an estlmat- 45 
ed value (Ne) of the loudness of the noise 
present In the returned signal, and 
a second substep of noise suppression 
(42; 46) earned out on the difference signal 
using said produced estimated value (Ne) so 
as to produce the modified difference sig- 
nal (□•(t.f)). 

4. Method according to claim 3, characterised in 
that the second substep of noise suppression in- ss 
eludes further substeps of: 

deriving (46) from the estimated value (Ne) a 



loudness criterion (C), 

setting (47, 48, 49) distorsions in the loudness 
(domain) of the difference signal, which do not 
suffice the criterion, to zero in the loudness (do- 
main) of a thresholded difference signal (D (t 
f)). and 

deriving (50) the modified difference signal (D' 
(t,f)) by calculating a distorsion loudness to sig- 
nal loudness ratio (DSR{t.f)) of the thresholded 
signal (Dg(t,f)) with respect to a loudness de- 
graded signal (R*(t)) derived from the first rep- 
resentation signal (R'(t,0). 

5. Method according to any of the claim 2,-,4, char- 
acterised In that the estimated value of the noise 
loudness is derived from the first representation sig- 
nal (R'(t.f)). 

6. Method according to any of the claims 2,-,6, char- 
acterised in that the degraded signal (s'(t)) is a sig- 
nal combination of the talker speech signal (s(t)) 
and the returned signal (r(t)). 

7. Method according to any of the claims 2,-, 5, char- 
acterised In that the returned signal (r(t)) is used 
as the degraded signal (s'(t)) and that an intemrie- 
diate signal (Ps(f)) obtained during an intennedlate 
stage of the second processing step of processing 
the reference signal is combined with a correspond- 
ing intemiedlate signal (Ps'(f)) obtained during a 
con-espondlng intemiediate stage of the first 
processing step of processing the degraded signal. 

8. Method according to claim 7, characterised In 
that the Intermediate signal is an Fast Fourier 
Transform power representation (Ps(f)) of the refer- 
ence speech signal (s(t)). 

9. Method according to any of the claims 1 ,-,8, char- 
acterised in that the talker speech signal and the 
returned signal are taken off from an established tel- 
ephone link. 

10. Device for measuring a talking quality of a tele- 
phone link in a telecommunications network (10), 
the device comprising measurement means (22; 
31 , 36) for subjecting a degraded speech signal s' 
(t) with respect to a reference speech signal s(t) to 
an objective measurement technique for measuring 
a perceptual quality of speech signals, and produc- 
ing a quality signal (q) which represents an estimat- 
ed value concerning the talking quality, the refer- 
ence speech signal being a talker speech signal (s 
(t)) and the degraded speech signal including a re- 
turned signal r(t). the returned signal being a signal 
which occurred or may occur in a return channel of 
the telephone link during the transmission of the 
talker speech signal in a forward channel of the tel- 
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ephone link, 

characterised in that the measurement means In- 
clude means (32b) for a modelling of masking ef- 
fects in consequence of noise present in the re- 
turned signal. ^ 

1 1 . Device according to claim 1 0, characterised in 
that the device comprises: 

first processing means (39) for processing the io 
degraded speech signal (s'(t)) and generating 
a first representation signal (R'{t,f)), 
second processing means (38) for processing 
the talker speech signal (s(t)) and generating a 
second representation signal (R(t,f)), and ^5 
combining means (32) for combining the first 
and second representation signals as to pro- 
duce said output signal (q), the combining 
means Including said means (32b) for model- 
ling the masking effects. 

12. Device according to claim 1 1 , characterised In 
that 

the combining means Include: 

subtracting means (40) for subtracting the 
first representation signal from the second 
representation signal as to produce a dif- 
ference signal (D(t,f)), 
said modelling means (41 , 42)) for model- 
ling the masking effects carried out on the 
difference signal as to produce a modified 
difference signal, and 

Integrating means (43) for integrating the 35 
modified difference signal with respect to 
frequency and time as to produce the qual- 
ity signal. 

modelling means include: 

means (41 ) for producing an estimated val- 
ue (Ne) of the loudness of the noise 
present in the returned signal, and 
means (42) for carrying out a noise sup- 45 
pression on the difference signal using said 
produced estimated value (Ne), and for 
producing the modified difference signal 
(DHt,f)). 

50 

the first representation signal (R*(t,f)) being a 
representation signal of a signal combination of 
the talker speech signal and the returned sig- . 
nal. 

55 

12. Device according to claims 11, characterised 
in that the devk^e includes a signal combinator (24) 
for combining the talker speech signal (s(t)) and the 



returned signal (r(t)) as to fomn the degraded signal 
(s'(t)). 

13. Device according to claim 1 1 , characterised In 
that the device includes an intermediate signal 
combination means (55) for combining an Interme- 
diate signal (Ps(f)) obtained in an intennediate 
stage of the second processing means (38) with a 
corresponding intemnediate signal (Ps'(f)) obtained 
in a con^esponding intermediate stage of the first 
processing means (39), the degraded signal (s*(t)) 
being the returned signal (r(t)). 

14. Device according to claim 13, characterised in 
that the intermediate signal combination means 
(55) is included in the first processing means (39) 
after means (FTT) for performing a Fast Fourier 
Transfomri. 



EP 1 206 104 A1 





9 



EP 1 206 104 A1 




s'(t) = s(t) © r(t) 



39. 



Perception 
model 



35 
32a'" 



R(t,f) 



R'(t.f) 



36 



40 



Perceptual 
subtraction 








,D(t,f) 


Noise 
suppression 








, D'(t.f) 


Integr 

(f 


ation 



.44 



32b 



32c 



42 




43 



37 



FIG. 3 



EP 1 206 104 A1 



from 40 



from 36 

R'(t.f)' 



D(t.f) 




Dc(t.f) = D(t.f) 



45 



Integration 
if) 

R'(t) 



Noise 
estimation 



/ 
46 



Dc(t,f) = 0 



48 49 



50' 



DSR 



D'(t,f) = DSR(t.f) 



to 43 



FIG. 4 



11 



EP 1 206 104 A1 



s(t) 



s'(t) = r(t) 



HW 
F=TT 


Ps(f) 






FW 


FS 


IV 





35 




R(t,f) 



R'(t.f) 



FIG. 5 



EP 1 206 104 A1 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number 

EP 00 20 3936 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of documant wfth indication, where appropriate, 
of relevant passag es ' 



Relevant 
to claim 



CUSSIFICAT10N OF THE 
APPLICATiON {lnt.CI.7) 



WO 98 59509 A (ERICSSON TELEFON AB L M) 
30 December 1998 (1998-12-30) 

* abstract * 

* column 4, line 11 - column 5, line 10 * 

* column 6. line 2 - column 7. line 30; 
figure 2 * 

* column 8, line 20 - column 9, line 8 * 

US 4 677 676 A (ERIKSSON LARRY J) 
30 June 1987 (1987-06-30) 

* abstract * 

VASEGHI S V ET AL: "NOISE COMPENSATION 
METHODS FOR HIDDEN MARKOV MODEL SPEECH 
RECOGNITION IN ADVERSE ENVIRONMENTS" 
IEEE TRANSACTIONS ON SPEECH AND AUDIO 
PROCESSING, US, IEEE INC. NEW YORK, 
vol. 5, no. 1, 1997, pages 11-21, 
XP000785324 
ISSN: 1063-6676 
the whole document 



1,9,10 



H04M3/32 



1.9,10 



.10 



TECHNICAL FIELDS 
SEARCHED (lnt^CI.7) 

H04M ~ 



The present search report has been drawn up tor all claims 



THE HAGUE 



Date ol ccmpkitjunol the saaich 

19 Apri 1 2001 



CATEGORY OF CITED DOCUMePNfTS 

X : padiculariy relevant if taken alorw 

y : particularly relevant If combined with anothet 

cocumen! of :he same category 
A : iBchnologlcal background 
O ; non-wr1ttc>n disrjosurc 
H : interrnediat'' dtxwmont 



Eounner 



Mi 11 ems, B 



T : -.hocN-y oflrwiple urKJerlyIng the IwenHci'i 
I: : earlier patent document, but published cf>, or 

«*tar the Wmg datis 
D : dodiment died in (he appllcatlofi 
I. : riociiment dted for other reasons 

K : rr.embci of the same patent family, corrospciitf na 
oocumenl " 



13 



EP 1 206 104 A1 



ANNEX TO THE EUROPEAN SEARCH REPORT 

ON EUROPEAN PATENT APPLICATION NO. EP 00 20 3936 



This annex lists the patent family members relating to ttie patent documents cited In the above^entioned European search repoit. 
The members are as contained in the European Patent Office EOP file on 

The European Patent Office Is in no way liable fo^ these particulars which are merely given for the purpose of Information. 

19-04-2001 



Patent document 




PubNeatbn 




Patent family 


Publication 


cited In search report 




date 




memb«i(s) 


data 


WO 9859509 


A 


30-12-1998 


US 


6201960 B 


13-03-2001 








AU 


7950598 A 


04-01-1999 








BR 


9810326 A 


05-09-2000 


US 4677676 


A 


30-06-1987 


AT 


69660 T 


15-12-1991 








AU 


590384 B 


02-11-1989 








AU 


6860487 A 


13-08-1987 








CA 


1281294 A 


12-03-1991 








DE 


3774587 A 


02-01-1992 








EP 


0233717 A 


26-08-1987 








ES 


2028063 T 


01-07-1992 








JP 


2539812 8 


02-10-1996 








OP 


62193310 A 


25-08-1987 



o 

u> For more details about this annex : see Official Journal of the European Patent Offtce. No. 12/82 



